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ABSTRACT 


We  consider  a  coherent  system  S  consisting  of  m  independent  components  for 
which  we  do  not  know  the  distributions  of  the  components'  lifelengths.  If  we  know 
the  structure  function  of  the  system,  then  we  can  estimate  the  distribution  of  the 
system  lifelength  by  estimating  the  distributions  of  the  lifelengths  of  the  individual 
components.  Suppose  that  we  can  collect  data  under  the  ‘autopsy  model’,  wherein  a 
system  is  run  until  a  failure  occurs  and  then  the  status  (functioning  or  dead)  of  each 
component  is  obtained.  This  test  is  repeated  n  times.  The  autopsy  statistics  consist 
of  the  age  of  the  system  at  the  time  of  breakdown  and  the  set  of  parts  that  are  dead 
by  the  time  of  breakdown.  Using  the  structure  function  and  the  recorded  status  of 
the  components,  we  then  classify  the  failure  time  of  each  component.  We  develop  a 
nonparametric  Bayesian  estimate  of  the  distributions  of  the  component  lifelengths 
and  then  use  this  to  obtain  an  estimate  of  the  distribution  of  the  lifelength  of  the 
system.  The  procedure  is  applicable  to  machine-test  settings  wherein  the  machines 
have  redundant  designs.  A  parametric  procedure  is  also  given. 
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system.  The  procedure  is  applicable  to  machine-test  settings  wherein  the  machines 
have  redundant  designs.  A  parametric  procedure  is  also  given. 


CHAPTER  1 

INTRODUCTION  AND  SUMMARY 


Consider  a  coherent  system  S  consisting  of  m  independent  components  for  which 
we  do  not  know  the  distributions  of  the  component  lifelengths.  Assume  that  each  of 
the  m  components  occupies  one  of  two  sta  es,  functioning  or  failed.  We  consider  the 
statistical  model  in  which  each  element  of  a  sample  of  n  replicates  of  S  is  observed 
until  it  fails.  The  observed  data  consist  of  the  set  of  components  that  are  in  a  failed 
state  and  the  failure  time  of  the  system.  The  failure  times  of  the  dead  components 
are  not  directly  observed.  The  set  of  dead  components  and  the  system  failure  time 
comprise  the  “autopsy  statistics”  of  the  system.  This  model  is  usually  called  the 
autopsy  model. 

Two  statistical  problems  arise  in  considering  the  autopsy  model — the  problems 
of  estimating  the  distributions  of  the  component  lifelengths  and  the  distribution  of 
the  entire  system’s  lifelength.  One  approach  to  estimating  the  distribution  of  the 
system  lifelength  is  to  use  only  the  observed  system  failure  times.  For  example,  we 
could  use  the  empirical  distribution  function.  However,  such  an  approach  ignores  the 
(partial)  information  we  have  about  the  components  of  the  system.  If  the  structure 
function  of  the  system  is  known,  the  distribution  of  the  lifelength  of  the  system 
can,  in  general,  be  calculated  from  knowledge  of  the  distributions  of  the  component 
lifelengths.  Hence,  an  alternative  approach  would  be  to  estimate  the  distributions  of 
the  lifelengths  of  the  m  components  of  system  S  and  then  use  the  structure  function 
of  S  to  estimate  the  distribution  of  the  system  lifelength. 
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Clearly,  the  component  information  provided  by  the  autopsy  model  is  quite  lim¬ 
ited.  It  is  reasonable  to  consider  alternative  testing  procedures  which  provide  more 
component  information.  However,  the  autopsy  model  is  important  when  alternative 
testing  procedures  such  as  separate  testing  of  components  are  not  possible  or  prac¬ 
tical.  For  example,  it  may  be  difficult  to  reproduce  the  conditions  which  exist  in 
the  functioning  system  when  components  are  tested  separately.  For  such  systems, 
it  is  important  to  obtain  every  bit  of  information  about  the  components  when  they 
are  parts  of  a  machine  or  other  system.  In  these  settings,  as  well  as  the  case  of  cer¬ 
tain  biological  systems,  the  autopsy  model  is  natural.  Probabilistic  aspects  of  the 
autopsy  model  were  considered  by  Meilijson  (1981),  Nowick  (1990),  and  Antoine, 
Doss,  and  Hollander  (1993).  Inferential  aspects  of  the  autopsy  model  have  been 
considered  only  by  Watelet  (1990)  and  Meilijson  (1994). 

The  U.S.  Air  Force’s  C-17  transport  airplane’s  Fuel  Quantity  (FQ)  computer 
is  an  example  of  a  system  in  which  the  “component”  is  a  logical  subsystem  whose 
status  can  be  readily  determined  in  the  field.  The  FQ  computer  is  a  parallel  system 
of  order  two,  consisting  of  an  “A”  bus  and  a  “B”  bus.  When  this  system’s  data 
were  first  examined  by  the  authors,  there  were  approximately  2440  cumulative  flying 
hours  spread  among  six  different  prototype  C-17’s,  each  of  which  contained  one  FQ 
computer.  We  present  a  data  analysis  using  our  procedure  in  Chapter  3. 

Consider  the  autopsy  statistics  as  described  above.  Assume  that  the  system 
we  wish  to  study  is  a  coherent  system  (see  Chapters  1  and  2  of  Barlow  and 
Proschan  (1975)).  We  can  use  the  structure  function,  the  set  of  dead  components, 
and  the  failure  time  of  the  system  to  say  more  about  the  failure  times  of  each  com- 
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ponent.  Specifically,  for  each  of  the  n  replicates  of  5,  we  can  classify  the  failure  time 
of  each  of  the  m  components  as  follows: 

[Cl]  Component  failure  time  is  greater  than  system  failure  time. 

[C2]  Component  failure  time  is  less  than  system  failure  time. 

[C3]  Component  failure  time  is  equal  to  system  failure  time. 

[C4]  Component  failure  time  is  either  less  than  or  exactly  equal  to  system  failure 
time,  but  we  cannot  tell  which. 

The  first  two  categories  correspond  to  what  are  usually  termed  right- censored 
data  and  left-censored  data,  respectively.  Right-censored  data  occur  when  the  com¬ 
ponent  is  still  alive  when  the  system  fails.  Left-censored  data  occur  when  the  com¬ 
ponent  is  dead  and  information  contained  in  the  structure  function,  along  with 
information  contained  in  the  set  of  dead  components,  allow  one  to  deduce  that  the 
component’s  death  occurred  prior  to  the  system  failure  time.  Similarly,  the  third 
category  arises  when  a  component  is  observed  in  a  failed  state  and  we  deduce  that 
it  caused  the  system  to  fail.  The  last  category  occurs  whenever  all  the  components 
in  a  redundant  system  or  a  redundant  subsystem  belong  to  the  set  of  dead  compo¬ 
nents.  In  this  case,  we  know  there  is  exactly  one  component,  whose  identity  we  do 
not  know,  with  a  failure  time  equal  to  that  of  the  system,  while  the  failure  times  of 
all  the  other  components  axe  strictly  less  than  that  of  the  system. 

The  last  category  above  is  problematic,  particularly  in  frequentist  settings,  and 
is  at  the  heart  of  the  issue  of  “identifiability”.  (General  references  on  identifiability 
are  given  at  the  end  of  this  chapter.)  For  example,  consider  a  parallel  system  of 


order  two.  Let  the  distributions  for  components  1  and  2  be  F\  and  F2,  respectively. 
Both  components  will  always  be  dead  when  the  system  is  observed  in  a  failed  state. 
Under  the  autopsy  model,  one  can  only  observe  the  system  failure  time,  which  has 
distribution  F1F2.  This  simple  redundant  system  is  not  identifiable  in  the  frequentist 
sense  in  that  it  is  not  possible  to  determine  F\  and  Ft  from  a  knowledge  of  the 
distribution  of  the  observed  data. 

We  use  a  Bayesian  framework  because,  for  the  applications  we  have  in  mind,  we 
have  a  small  amount  of  data  but  have  extensive  past  experience  on  the  components 
in  other  systems.  For  example,  in  the  case  of  the  C-17  FQ  computer,  similar  but 
not  identical  FQ  computers  exist  in  other  aircraft  for  which  there  has  been  more 
extensive  testing.  Thus,  we  have  strong  reasons  to  suspect  the  distribution  of  the 
lifelengths  of  the  components  approximately  comes  from  a  certain  parametric  family, 
and  we  have  some  knowledge  about  the  parameters  of  this  distribution.  A  Bayes 
procedure  makes  sense  here  since  we  have  only  limited  testing  hours  on  just  six 
FQ  systems,  each  within  a  C-17,  and  we  wish  to  estimate  the  distribution  of  the 
lifelength  of  the  FQ  system  when  it  is  part  of  the  C-17. 

We  consider  a  Bayesian  framework  for  estimation  of  the  distributions  of  com¬ 
ponent  lifelengths  in  which  the  prior  distributions  on  each  of  the  F,’s  give  most  of 
their  mass  to  “small  neighborhoods  of  a  parametric  family”.  The  prior  distributions 
which  we  use  are  derived  from  the  Dirichlet  process  priors  discussed  by  Ferguson 
(1973,  1974).  The  Dirichlet  process  priors  are  probability  measures  on  V  parame¬ 
terized  by  the  set  of  all  finite  non-null  measures  on  the  real  line  1Z,  where  V  is  the 
space  of  all  probability  measures  on  71.  Let  a  be  a  finite  non-null  measure  on  the 
Borel  sets  of  71.  The  random  distribution  function  F  is  said  to  have  a  Dirichlet 
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process  prior  distribution  with  parameter  a,  denoted  X>a,  if  for  every  measurable 
partition  {B\, . . . ,  Bt}  of  ft,  the  random  vector  ( F{B\ ), . . . ,  F(Bg ))  has  the  Dirich- 
let  distribution  with  parameter  vector  (a(B\), . . .  ,a(Bt))  (here  and  throughout  the 
rest  of  the  paper,  probability  measures  are  identified  with  their  <  tive  distri¬ 

bution  functions,  and  the  same  symbol  is  used  to  denote  both  a  u.  .tsure  and  its 
distribution  function  whenever  convenient).  When  a  prior  distribution  is  put  on  ft, 
then  for  every  t  €  ft,  the  quantity  F(t)  is  a  random  variable.  Write  H  =  a/a(ft), 
so  that  H  is  a  probability  measure  on  ft.  If  F  ~  VQ ,  then  EF(t)  =  H(t),  wuile  the 
quantity  a(ft)  indicates  the  degree  of  concentration  of  Va  around  its  “center”  H. 
For  example,  it  is  well  known  that  as  a(ft)  —*  oo,  Va  converges  to  the  point  mass  at 
H  in  the  weak  topology.  Ferguson  (1973)  showed  that  the  Dirichlet  priors  have  the 
property  that  the  support  of  Va  is  the  set  of  all  probability  measures  whose  support 
is  contained  in  the  support  of  H.  For  example,  if  the  support  of  H  is  the  positive 
real  axis,  then  the  support  of  Va  is  the  set  of  distributions  of  all  positive  random 
variables.  Ferguson  also  showed  that  if  F  ~  £>a,  then  F  is  a.s.  discrete. 

The  priors  on  each  of  the  ft’s,  i  =  1, . . . ,  m,  that  we  use  are  mixtures  of  Dirichlet 
priors.  To  keep  the  notation  less  cumbersome,  let  m  =  2.  For  component  1,  we 
consider  a  parametric  family  He,  9  €  0,  and  put  a  mixture  of  Dirichlets  as  the 
prior  on  F\.  That  is, 

ft-  JvQgv(d0), 

where  for  each  0  €  0,  —  a 0  <  a#(ft)  <  oo,  and  v  is  a  probability  mea¬ 

sure  on  0.  Similarly,  for  component  2  we  consider  the  parametric  family  K^,  ip  € 
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and  put  as  prior  on  F2  the  mixture  /  Vg^  p(dip).  For  each  ip  €  'P,  /3V,  =  3u,{'R)h\, 
0  <  ^(R)  <  oo,  and  p  is  a  probability  measure  on 

We  use  a  nonpar ametric  Bayesian  procedure  in  which  we  put  a  Dirichlet  process 
prior  on  each  of  the  F{  s.  This  has  two  advantages.  First,  we  protect  against  the 
problems  associated  with  using  an  incorrectly  specified  parametric  model,  such  as 
obtaining  an  inconsistent  estimator.  Second,  we  can  avoid  the  loss  of  efficiency  due 
to  ignoring  partied  information  we  may  have  about  a  parametric  model,  since  we 
use  prior  distributions  (on  each  of  the  F,’s)  that  concentrate  their  mass  around  the 
hypothesized  parametric  family. 

Our  approach  is  based  on  the  Gibbs  sampling  algorithm  as  discussed  in  Gelfand 
and  Smith  (1990).  We  now  review  the  algorithm.  Let  fyx . yp  be  the  joint  dis¬ 
tribution  of  the  (possibly  vector- valued)  random  variables  We  suppose 

that  we  do  not  know  the  form  of  /y,,...typ,  but  that  we  know  the  conditional  dis¬ 
tributions  fYi\Yj,frii  1  —  or  that  at  least  we  are  able  to  generate  observa¬ 

tions  from  these  conditional  distributions.  Suppose  we  want  to  sample  observations 
from  the  joint  distribution  of  the  random  variables  Vj, . . . ,  Vj,,  or  simply  an  ob¬ 
servation  from  one  of  the  p  marginals.  The  algorithm  to  generate  an  observation 
from  /yi  ,...,yp  proceeds  as  follows.  We  fix  arbitrary  starting  values  Y}0\ . . . ,  Vp(0) 
and  then  update  these  values.  Draw  V"/1’  from  /y,|K,,>#i(*i  Y^,  •  •  •  *  Next, 

draw  Y2(l)  from  /y2\y,,  j&{Yil\  •,  T30*, . . . ,  V^0*).  Continue  until  we  draw  Vp(1) 
from  fYp\Yj,  j#p0'i^5  •  •  •  >^p-x,*).  We  have  now  completed  one  iteration  of  the 
scheme  by  visiting  each  variable.  After  k  iterations,  we  have  the  random  vari¬ 
able  (Yxk\  . . . ,  YjW).  The  sequence  (Y^\  . . . ,  VjP),  j  =  1,2, . . .,  is  a  Markov  chain 
and  /vi  ,...,yp  is  stationary  distribution  of  the  chain.  If  one  can  establish  that  the 


< 


chain  converges  in  distribution  to  /y, . yp,  then  (for  large  k)  {Y^ _ _  Yjkij  has  a 

distribution  which  is  approximately  equal  to  /y, . yp.  Such  observations  can  be  used 

to  estimate  /y, . yp. 

Perhaps  the  most  natural  way  to  implement  the  Gibbs  sampler  here  is  to  proceed 
as  is  normally  done  in  a  Bayesian  analysis  of  missing  data  problems  under  conjugacy. 
That  is,  consider  the  pair  (parameter  0,  missing  data):  In  such  a  setup,  if  we  knew 
the  missing  data,  we  would  easily  be  able  to  find  the  conditional  distribution  of  the 
parameter  0,  and  if  we  knew  the  parameter  0  we  would  be  able  to  generate  the  miss¬ 
ing  data.  Indeed,  this  is  precisely  the  approach  taken  by  Doss  (1994),  who  considers 
the  use  of  Dirichlet  priors  for  the  problem  of  estimating  an  unknown  distribution  F 
in  the  presence  of  censoring.  He  considers  random  variables  Xi, . . . ,  Xn  ~  F,  but 
only  observes  X,  €  A,,  where  A,  is  a  singleton  if  X,  is  uncensored  and  A,  =  (cj,  oo) 
if  is  censored  on  the  right  by  Cj.  His  approach  is  based  on  a  Gibbs  sampling 
algorithm  of  length  2  involving  (X,  F);  that  is,  he  generates  X  from  £data( X  |  F) 
and  F  from  C^txJ^F  \  X),  where  data  consists  of  the  sets  A i, . . . ,  An.  (Here,  we  use 
the  notation  £w3(Wi)  or  C(Wi  |  W2)  to  denote  the  conditional  distribution  of  the 
random  variable  W\  given  the  random  variable  W^.)  The  details  of  carrying  out  the 
steps  of  the  algorithm  rely  on  a  constructive  definition  of  the  Dirichlet  prior,  given 
in  Sethuraman  (1994). 

Our  initial  approach  was  simply  to  extend  the  technique  of  Doss  (1994)  to  our 
setting;  however,  the  approach  fails  since  it  turns  out  that  the  procedure  produces 
a  Markov  chain  which  does  not  converge  to  the  posterior  distribution.  We  present 
an  entirely  different  algorithm  that  produces  a  Markov  chain  which  we  argue  (in 
Chapter  2)  does  converge  to  the  posterior  distribution. 
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We  now  give  a  preliminary  explanation  of  our  procedure  for  the  case  where  the 
priors  on  each  of  the  F,’s  are  single  Dirichlets;  i.e.  F,  ~  Z>Qi.  Let  S,,  i  =  1, . . .  ,n, 
be  the  vector  of  lifelengths  of  the  m  components  (for  system  i)  if  we  could  see 
them  all.  Let  data  be  the  set  of  autopsy  statistics  for  the  n  systems.  (It  may 
be  helpful  to  think  of  the  case  of  parallel  systems.  In  this  case,  data  consists 
simply  of  the  n  system  failure  times.)  The  algorithm  proceeds  as  follows.  Fix 
arbitrary  starting  values  S{0), . . . ,  S^°K  Generate  Sj1*  ~  £dat»(Si  |  ■  ■  ■ ,  S^). 

Next,  generate  5^  ~  I  •S’j1*,  S^\ . . . ,  S^).  Continue  until  we  generate 

~  |  5^, . . . ,  S^li ).  We  have  now  completed  one  iteration  of  the 

procedure.  We  repeat  the  procedure  a  large  number  of  times  and  use  the  realizations 
of  the  chain  to  estimate  £d*u(Si,  •  -  •  >  Sn).  There  are  two  key  points  that  allow  this 
procedure  to  produce  a  Markov  chain  which  converges  to  the  posterior  distribution. 

•  The  distribution  of  the  lifelength  of  the  jth  component  of  system  S,  is  condi¬ 
tional  on  the  observed  data  and  on  the  current  set  of  lifelengths  for  the  jth 
components  of  Sf,t  /  *. 

•  The  joint  unconditional  distribution  of  the  n  lifelengths  of  component  j  can 
be  described  in  full. 

The  estimate  of  £<w(*S'i, . . . ,  S„)  can  be  used  to  obtain  an  estimate  of 
£data(Fi,  .  .  .  ,  Fm). 

Note  that,  in  contrast  to  Doss  (1994),  we  deal  only  with  the  random  lifelengths 
(of  the  components)  and  bypass  entirely  the  problem  of  generating  the  infinite¬ 
dimensional  Fj's. 
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In  Chapter  2,  we  explore  the  algorithm  further.  In  Chapter  2.1  we  give  a  Gibbs 
sampling  algorithm  for  our  problem  under  parametric  assumptions.  In  Chapter  2.2, 
we  present  a  simple  extension  of  the  procedure  of  Doss  (1994)  to  our  setting  and  show 
that  this  leads  to  a  reducible  Markov  chain.  In  Chapter  2.3,  we  describe  in  detail 
our  new  algorithm  for  the  case  of  parallel  systems  and  give  an  heuristic  argument 
that  the  Markov  chain  used  in  the  algorithm  converges  to  the  posterior  distribution. 
We  also  establish  this  rigorously  for  a  parallel  system  and  argue  that  the  procedure 
converges  to  the  posterior  distribution  for  a  general  system.  In  Chapter  2.4,  we 
discuss  the  implementation  of  the  algorithm  for  the  case  of  arbitrary  systems,  for 
the  case  of  arbitrary  systems  and  discuss  identifiability  and  frequentist  consistency. 
In  Chapter  3  we  illustrate  our  nonparametric  procedure  on  data  pertaining  to  the 
C-17  FQ  computer. 

The  last  portion  of  this  chapter  is  used  to  summarize  other  relevant  research 
in  the  statistical  literature.  First,  we  define  the  term  identifiability.  Suppose  sys¬ 
tem  S  has  m  components  and  the  component  distributions  axe  denoted  F\, . . . ,  Fm. 
If  Fx,. . .  ,Fm  can  be  recovered  from  a  knowledge  of  the  true  distributions  of  the 
autopsy  statistics,  then  we  say  that  Fi,...,Fm  are  identifiable  and  that  S  is  an 
identifiable  system.  For  example,  any  parallel  system  is  nonidentifiable.  Meilijson 
(1981),  Nowick  (1990),  and  Antoine,  Doss,  and  Hollander  (1993)  considered  prob¬ 
abilistic  aspects  of  the  autopsy  model.  In  these  three  papers,  the  authors  identify 
conditions  on  the  structure  function  of  the  system  and  on  the  distributions  of  the 
component  lifelengths  that  guarantee  identifiability. 

Relevant  work  on  estimating  the  distribution  of  the  system  lifelength  under  the 
autopsy  model  by  first  estimating  the  distributions  of  the  component  lifelengths  and 
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then  using  the  structure  function  of  the  system  can  be  found  in  Watelet  (1990)  and 
Meilijson  (1994).  Watelet  (1990)  considered  two  estimators  for  the  autopsy  model. 
He  developed  nonparametric  estimators  for  the  Fj's.  Meilijson  (1994)  estimated  the 
parameters  of  these  distributions  from  the  empirical  estimate  of  C(Z,  D),  where  Z 
is  the  lifetime  of  the  machine  and  D  is  the  diagnostic  set  of  parts  that  had  died 
by  time  Z,  by  maximum  likelihood  from  incomplete  data  via  the  EM  algorithm. 
Meilijson  assumes  the  Fj's  are  drawn  from  well-behaved  parametric  families. 

Under  the  assumption  that  the  failure  times  of  the  dead  components  are  known, 
Doss,  Freitag,  and  Proschan  (1989)  also  considered  inferential  aspects  of  estimating 
the  distribution  of  the  lifelength  of  the  system  by  first  estimating  the  distribution 
of  the  lifelengths  of  the  system’s  components.  In  their  model,  they  have  more 
information  than  is  available  under  the  autopsy  model. 


CHAPTER  2 

DEVELOPMENT  AND  CONVERGENCE  OF  THE  ALGORITHM 


As  mentioned  earlier,  the  Gibbs  sampling  algorithm  is  ideal  for  missing  data 
problems  in  a  Bayesian  model  under  conjugacy;  see  the  linkage  example  or  the 
Dirichlet  sampling  process  example  in  Tanner  and  Wong  (1987),  or  Example  1  in 
Casella  and  George  (1992).  In  the  present  problem,  we  describe  the  Gibbs  sampling 
algorithm  for  this  problem  in  the  parametric  case,  applied  to  parallel  systems.  We 
then  show  that  a  naive  extension  of  the  simple  censored  data  case  considered  by  Doss 
(1994)  leads  to  a  reducible  Markov  chain  (when  applied  to  parallel  systems).  Next, 
we  review  our  procedure  for  the  case  of  parallel  systems  and  show  that  it  produces 
an  irreducible  Markov  chain  which  converges  to  the  correct  stationary  distribution. 
We  then  argue  that  the  procedure  converges  to  the  correct  stationary  distribution 
for  a  general  system.  Lastly,  we  discuss  the  implementation  of  the  algorithm  for  the 
case  of  arbitrary  systems. 

In  the  description  of  the  Gibbs  sampler  in  Chapter  1,  we  described  how  to 

generate  an  observation  (Yxk\ . . . ,  Y^)  with  distribution  approximately  fy1 . yp  by 

running  the  chain  sufficiently  long.  One  can  estimate  fyx . yp  by  generating  G  such 

chains  in  parallel,  obtaining  independent  observations  {Y^k'l\ . . . ,  K*fc,G*),  or  by 
running  one  very  long  chain.  For  a  discussion  of  the  advantages  and  disadvantages 
of  these  (and  other)  schemes,  see  Geyer  (1992). 
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2.1  A  Gibbs  Sampling  Algorithm  for  the  Autopsy  Model  under 

Parametric  Assumptions 

To  keep  the  notation  as  light  as  possible,  we  shall  consider  a  parallel  system 
of  order  2  for  which  we  have  n  replicates.  For  the  ith  replicate,  i  =  1,. . .  ,n,  let 
the  observed  data  (system  failure  time)  be  mt ,  the  maximum  of  the  two  component 
lifelengths,  Xi  and  Y{.  Let  Z  =  (X ,  K),  where  X  =  (Xi,. . . ,  X„),  Y  =  (Vi, . . . ,  Fn). 
Let  Xi  ~  F$  with  9  €  @  and  suppose  9  ~  i/,  where  v  is  a  conjugate  prior  distribution 
on  9.  Let  Y{  ~  G ^  with  0  £  ®  and  suppose  0  ~  /x,  where  /i  is  a  conjugate  prior 
distribution  on  0.  Assume  that  F  and  G  are  absolutely  continuous  distribution 
functions.  To  make  the  discussion  as  easy  as  possible  to  follow,  consider  the  case 
where  Fe  =  £(9)  (the  exponential  distribution  with  parameter  9)  with  9  ~  v  = 
Q(a\ ,6j)  (the  Gamma  distribution  with  shape  parameter  ax  and  scale  parameter  6j) 
and  G =  5(0)  with  0  ~  /x  =  £(02,M- 

Before  describing  the  algorithm,  we  introduce  the  following  notation.  If  H  is  a 
distribution  function,  B  is  a  set,  and  X  is  distributed  according  to  H,  then  Hb  will 
denote  the  conditional  distribution  function  of  X  given  that  X  €  B\  that  is 

Hb{A)  =  H{AnB)/H(B)  (2.1) 

when  H(B)  >  0. 

The  algorithm  proceeds  as  follows. 

Give  arbitrary  initial  values  to  (X,  T)-0^  such  that  X- V  Y =  nii. 

For  k  =  1, . . . ,  K: 

1.  Generate  (0,0)W  ~  £data((0,0)  I  (X,  Y)lk~"). 
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2.  Generate  (X,  V)-^  ~  £dat»((Af,  T),  |  (0,  t/>)*fc)),  independently  for  each  i,  i  = 
1 


We  now  describe  these  two  steps  in  more  detail. 

In  step  1  of  the  algorithm,  AutaU^i^)  I  is  the  product  of  two  gamma 

distributions,  Q(ai  +n,6i  +  E?=i  and  £(a2  +  n,  b?  +  £"=1  V,**-1*);  i  e.  9  and 

ip  are  generated  independently.  (Note  that  knowledge  of  X  and  Y  make  knowledge 
of  the  observed  data  superfluous.)  To  carry  out  step  2,  first  let  fg  and  g^,  be  the 
densities  of  Fg  and  G^,,  respectively.  Then,  for  each  i,  i  =  1, . . . ,  n,  set 


_ fg(i‘)(mi)GjJ(k)(mi) _ 

/e(k)(»«t)G0(*)(mi)  +  g^vim^F^im,)' 


where  F\  —  G\  =  £(A).  Now,  generate  Z *fc)  by  setting  (X,  T),-**,  i  =  l,...,n,  as 
follows: 


(X,Yf'  =  l 


(m,,  V") ,  where  V  ~  with  probability  p\k) , 

{  (V,mv) ,  where  V  ~  F^k)t[0tJnt)  with  probability  (1  -  p\k]) 

where  the  notation  used  for  the  conditional  distribution  functions  is  given  by  (2.1). 

To  estimate  £^tll((0,  ^>)),  we  use  the  sequence  of  generated  Z’s  to  approximate 
the  mixture 


/  £<*,.((«  ,  ip)  |  complete  data)  d£aata( complete  data) 

by,  say,  I  (X,V)W). 

If  v  and  p  are  not  conjugate  priors  for  Fg  and  G^,  respectively,  but  they  are 
continuous  distribution  functions  with  univariate  log-concave  density  functions,  then 
we  can  still  apply  the  above  algorithm  by  using  an  efficient  rejection  scheme  of  Gilks 
and  Wild  (1992). 
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2.2  A  Naive  Extension  of  the  Algorithm  of  Doss  (1994) 

Doss  (1994)  uses  a  Gibbs  sampling  algorithm  of  length  2  involving  (X,  F).  The 
prior  he  put  on  F  is  actually  a  mixture  of  Dirichlets,  but  we  shall  consider  a  sin¬ 
gle  Dirichlet  prior  for  our  naive  extension,  since  if  the  procedure  fails  for  a  single 
Dirichlet,  it  will  fail  a  fortiori  for  a  mixture. 

Suppose  we  have  n  replicates  of  a  parallel  system  of  order  2.  Let  Z,  X,  and  Y 
be  defined  as  in  Chapter  2.1.  Suppose  X;  ~  F  and  Y,~G,  i  =  1, . . .  ,n.  Suppose 
F  ~  T>a  and  G  ~  T>p,  where  T>a  and  Vq  are  Dirichlet  priors  with  parameter  measures 
a  and  /?.  We  observe  data  =  (mj, . . . ,  mn),  where  m,  =  X,  V  Ft,  i  =  1, . . . ,  n.  Our 
goal  is  to  estimate  £a„tB(F,  G). 

We  kn~w  that  the  conditional  distribution  of  (F,  G)  given  (X,Y)  is 

Auu((F,  G)  |  Z)  =  £>q+£;*=i  Sx.  ®  2V£;=1 6* »  (2-2) 

where  <g)  denotes  product  measure;  i.e.  F  and  G  are  independent.  Also,  given  an 
updated  (F,  G),  we  can  generate  a  random  Z  conditional  on  the  data.  To  run  the 
Gibbs  sampling  algorithm,  we  first  set  initial  values  Z^0)  and  (F,  G)(0).  Then,  for 
some  large  X,  execute  the  following  loop  for  k  =  1, ....  K: 

1.  Draw  (F,G)W  from  £d.u((F,G)  |  Z(k~l)). 

2.  Draw  Z(k)  from  C^{z  \  (F,G)(k>). 

To  carry  out  the  second  step  in  the  above  loop,  we  need  to  compute  the  probabil¬ 
ity  that  each  component’s  lifelength  takes  on  the  observed  maximum,  since  one  of  the 
values  must.  Consider  the  case  where  n  =  1.  Let  the  observed  maximum  be  denoted 
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by  m.  We  assume  that  m  >  0  and  that  a  and  0  are  continuous  measures.  Suppose 
the  initial  value  of  is  (Aj,  >i)*0'  =  (m,  V),  for  some  V  <  m.  These  initial  values 
give  rise  to  (F,  G)(1*  via  the  first  step  of  the  algorithm.  By  (2.2),  F(1)  ~  T>a+sm , 
but  by  definition  of  the  Dirichlet  prior,  this  implies  that  F^({m})  is  distributed 
as  a  Beta  distribution  with  parameters  (a  +  <5m)({m})  and  (a  +  6m)({m}c),  and 
therefore,  FW({m})  >  0  with  probability  1.  Next,  recall  from  the  previous  chapter 
that  if  F  ~  V-,,  then  EF{A)  =  7o( A ),  where  A  is  a  set  and  70  =  7/7(7^).  Since 
G  ~  X>/3,  then  F6^1l({m})  =  ({m})  =  0.  Thus,  G(1,({m})  =  0  with  proba¬ 

bility  1.  In  other  words,  with  probability  1,  F ^  has  an  atom  at  m  and  G ^  does 
not,  from  which  it  is  clear  that  P{X^  =  m  |  V  =  m}  =  1.  As  we  continue 
to  run  the  algorithm,  A^,  A*3\ . . .  will  each  take  on  the  value  m.  The  case  where 
(Ai,yi)(°)  =  (V,m)  is  handled  by  symmetry.  Thus,  we  see  that  the  st,  ting  point 
does  not  get  “washed  out”  and  therefore,  the  algorithm  produces  a  Markov  chain 
that  cannot  converge  to  the  posterior  distribution. 


2.3  The  Algorithm  We  Propose:  Detailed  Description  and  Proof  of  its 
Convergence  for  the  Case  of  Parallel  Systems 

Before  introducing  the  setup  and  notation  needed  for  the  algorithm,  we  briefly 
describe  the  “extended  Polya  urn  scheme”  as  given  in  Blackwell  and  MacQueen 
(1973).  Define  a  sequence  {7\,  T2, . . .}  of  random  variables  as  a  Polya  sequence  with 
parameter  a  if  for  every  B  C  72.,  we  have  P(T\  €  B)  =  a(B)/a( 71),  and  for  every 
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n, 

P(Tn+1  e  B  I  7\, . . . ,  Tn)  =  (a(B)  +  £  <5T.(B))/(a(ft)  +  n).  (2.3) 

t=i 

Blackwell  and  MacQueen  proved  that  if  {Ti,Ti, . . .}  is  a  Polya  sequence  with  pa¬ 
rameter  a ,  then  the  empirical  distribution  of  {T), . . . ,  Tn}  converges  a.s.  to  a  limiting 
discrete  measure  H.  Furthermore,  H  ~  Va.  Also,  given  //,  the  random  variables 
7i,  T2,  ■ . .  are  iid  ~  H.  In  addition,  for  every  n, 

T\, . . . ,  Tn  are  exchangeable.  (2.4) 

Now,  we  consider  a  parallel  system  of  order  2;  the  implementation  of  the  al¬ 
gorithm  for  the  case  of  arbitrary  systems  is  discussed  in  Chapter  2.4.  Recall  that 
Xi  and  Y,  are  the  lifelengths  of  components  1  and  2  in  system  i.  For  the  X,’s,  we 
consider  a  parametric  family  He,  0  €  0  C  Rdl,  and  put  a  mixture  of  Dirichlets  as 
the  prior  on  F.  That  is, 

F~  Jvaev(d6),  (2.5) 

where  for  each  0  £  0,  ate  =  <xe(R)He,  0  <  ate(R)  <  00,  and  v  is  a  probability 
measure  on  0.  Similarly,  for  the  V^’s,  we  consider  the  parametric  family  xp  £ 
'F  C  Rr2 ,  and  put  the  mixture  /  n(dxp)  as  the  prior  on  G,  where  for  each 
xp  £  'P,  =  /3^(R)K^,,  0  <  0^,{R)  <  00,  and  /1  is  a  probability  measure  on  it. 

We  will  assume  that  for  each  0  and  xp.  He  and  are  absolutely  continuous,  with 
continuous  densities.  Given  F  and  G,  X\,  X2, . . .  are  iid  ~  F ,  and  Y\,  Y2, . . .  are  iid 

~  G.  Also,  for  every  n,  X\,...,Xn  are  exchangeable,  as  are  Y\, _ V„,  by  (2.4). 

We  assume  that  F  is  independent  of  G.  It  follows  that  (Xi, . . . ,  Xn)  is  independent 
of  (Yu . . . ,  V^).  We  observe 

data  =  (mi, . . .  ,mn),  where  Xi  V  Yi  =  mt ,  i  =  1, . . .  ,n. 
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Our  goal  is  to  estimate  £data(ir',  G).  As  in  Chapter  2.2,  knowledge  of  (X ,  Y )  makes 
knowledge  of  data  superfluous. 

Recall  that  5,,  i  =  1, . . . ,  n,  is  the  vector  of  lifelengths  for  system  i.  For  the  case 
of  a  parallel  system  of  order  2,  5,  =  (A”,,  Vi).  To  unify  the  notation,  let  So  =  (0,  xp) 
and  let  S  =  (So,  •  • . ,  S„). 

The  algorithm  proceeds  as  follows.  Fix  arbitrary  starting  values  Sq°\  . . . ,  SjfK 
Then,  cycle  through  the  N  =  n  +  1  elements  of  S  in  order;  i.e.  at  time  i,  we  update 
element  K  =  K(t)  =  (t  —  1)  mod  N  by  generating 

sj?  ~  £data(s*  |  sf~l)  for  j±K).  (2.6) 

At  time  jN,  each  component  of  S  has  been  updated  j  times.  Our  algorithm  gen¬ 
erates  vectors  S^\  t  —  1,2,...,  where  is  the  same  as  except  for  the  one 

element  Sjp  which  has  been  updated  at  time  t  according  to  (2.6).  Note  that  this 
notation  differs  slightly  from  that  of  Chapter  1,  where  was  formed  by  updating 
all  the  elements  of  S^~l\ 

In  our  algorithm,  the  updating  of  i  =  1,. . .  ,n  is  accomplished  by  the  use 
of  the  following  lemma,  the  proof  of  which  is  a  calculation. 

Lemma  2.1  Suppose  A  and  B  are  distribution  functions  on  [0,  oo)  satisfying  A  = 
Ac+Aj  and  B  =  Bc+Bd,  where  Ac  and  Bc  are  absolutely  continuous  with  continuous 
derivatives  denoted  A'c  and  B'c,  and  Ad  and  Bd  are  discrete.  Let  X*  and  Y*  be 
independent  with  distributions  A  and  B,  respectively.  For  m  >  0,  we  can  generate 
a  pair  ( X ,  Y)  with 


( X ,  Y)  ~  £((X*,  Y*)  |rvr  =  m) 


(2.7) 
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by  the  mixing  procedure  we  now  describe. 

Define  probability  distributions  on  [0,  m)  by 

^c(v)  4  / _ \1  "c(^)  D  / _ \  ’  ^4j(v)  —  A  f _  x,  — 


Ac(m)’  ~cv~'  Be(m)’  Ad(m-)’  Bd(m~Y 

If  the  denominator  in  any  of  the  equations  above  is  0,  then  we  set  the  correspond¬ 
ing  probability  distribution  to  S0.  Let  Vi,  Vi,  V3,  V4  be  random  variables  with 
distributions  Ac,  Bc,  Ad,  Bd,  respectively.  We  now  take 


(X,  Y)  = 


(Vi,  m) 

with  probability  pi , 

(to,  V2 ) 

with  probability  pi, 

(Vi,  to) 

with  probability  p^, 

(to,  V4) 

with  probability  p4, 

(m,m) 

with  probability  p5, 

(2.8) 


where  the  mixing  probabilities  pi, ...  ,p5  are  given  by  the  following  formulas. 
If  Ad{{m})  +  Bd({m})  >  0,  then 


Pi 


Ac(m)Bd({m}) 


P2  = 


Ad({m})Bc{m) 


P4  = 


D  '  D 

Ad{{m})Bd(m-) 


P3  = 


Ad(m-)Bd({m}) 
D 


D 


,  and  p5  = 


D 


with  D  such  that  J2i=iP>  =  1-  If  Ad({m})  =  Bd({m})  =  0,  then 


Pi  = 


Ac(m)B'c(m ) 
D 


P2  = 


A'c(m)Bc(m ) 


P4  = 


D 

Ac(m)Bd(m-) 


P3  = 


Ad(m-)B'c(m) 


D 


D 


,  and  p5  =  0, 


with  D  such  that  £f=1Pi  =  1. 
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By  (2.3)  and  (2.4),  for  the  tth  step  of  the  algorithm,  the  conditional  distribution 
of  Xi ,  given  the  other  (n  —  1)  AVs  and  9.  is  given  by 


= 


a fito('R-)  +  n  -  1  ‘ 


(2.9) 


Similarly,  for  the  tth  step  of  the  algorithm,  the  conditional  distribution  of  V,,  given 
the  other  (n  —  1)  Yt  s  and  ip,  is 


/V°  +  &  Y(‘) 

P^o{K)  +  n-  1' 


(2.10) 


Thus  we  can  generate  =  (Xjl\Y/^)  by  using  Lemma  2.1  with  A  =  dj^,  Z?  = 


and  m  =  mt. 

To  update  in  the  above  algorithm,  we  will  need  formulas  for  updating  the 
“mixing  measures”  v  and  p.  The  formula  for  the  conditional  distribution  of  9  given 
X  is  well  known;  see  e.g.  Theorem  1  of  Doss  (1994),  which  is  repeated  as  Proposition 
2.1  below.  The  formula  for  the  conditional  distribution  of  ip  given  Y  will  be  evident 
by  symmetry.  The  notation  #(v)  is  used  to  denote  the  number  of  distinct  values  in 
the  vector  v. 


Proposition  2.1  Assume  that  for  each  0  G  0,  He  is  absolutely  continuous,  with 
a  density  he  that  is  continuous  on  71.  If  the  prior  of  F  is  given  by  (2.5),  then  the 
(marginal)  posterior  distribution  of  0  given  X  =  x  is 


where  the  **’  in  the  product  indicates  that  the  product  is  taken  over  distinct  values 
only,  T  is  the  gamma  function,  and  c(x)  is  a  normalizing  constant. 
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From  (2.11)  and  the  independence  of  F  and  G,  we  can  update  5q()  by  indepen¬ 
dently  generating  0^  and  xp^  from  i/x(n  and  /Zyfu,  respectively. 

Let  7 r  be  the  posterior  distribution  of  S  given  data  on  the  space  (7?2n  x  ©  x  ty,  B), 
where  B  is  the  collection  of  Borel  sets  on  7£2n  x@x$.  This  ir  is  also  a  stationary 
distribution  for  the  chain  {S^N*}°L0.  We  now  show  that,  for  a  parallel  system  of 
order  2,  the  distribution  of  the  Markov  chain  converges  to  7r  at  a  rate  that 

is  geometric  and  independent  of  the  starting  point.  Convergence  occurs  in  general, 
but  the  difficulties  in  establishing  it  arise  when  considering  a  parallel  system  or 
a  parallel  subsystem.  We  discuss  this  further  in  Chapter  2.4.  Before  stating  our 
theorem,  we  introduce  some  notation.  Define 

D  =  {«  :  Xi,yi  >  0  and  X;  V  for  i  =  1, . . .  ,n,  0  e  0,  ip  G  'P}, 

and  let  PJ(s,  C)  be  the  j-step  transition  probabilities  for  the  chain;  i.e. 

P]{8,C)  =  P{S{iN)  €  C  I  S(0)  =  8). 


Note  that  if  v  is  absolutely  continuous  with  respect  to  Lebesgue  measure,  then 
so  is  vx.  This  implies  vx  has  a  density  with  respect  to  Lebesgue  measure,  which  we 
shall  denote  v'x.  In  the  proof  of  Theorem  1,  we  shall  need  to  find  bounds  on  vx{0)  for 
fixed  6  as  x  varies  over  the  compact  set  [0,  m(„)]n.  Here,  m(n)  =  max(mi, . . .  ,mn). 
This  would  be  straightforward  if  vx(Q)  were  continuous  in  x,  but  inspection  of  (2.11) 
clearly  shows  this  is  not  the  case.  For  this  reason,  we  introduce  functions  fi(x,0) 
on  TV  x  ©,  defined  by 
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where  the  values  c^(x)  are  constants  such  that,  for  each  x,  f,(x.&)  is  a  density  in  6. 
Note  that 

*4(0)  =  /#(»>(*•,*), 

where  x*  is  comprised  of  the  distinct  values  of  x  arranged  in  any  order. 

Theorem  1  Consider  the  Markov  chain  {S^N^}]  resulting  from  applying  the  above 
algorithm  to  a  parallel  system  of  order  2.  Suppose  that  v  and  p  are  absolutely 
continuous  with  respect  to  Lebesgue  measure  and  that 

1.  The  observed  maximum  values  are  distinct  and  positive. 

2.  There  exists  a  compact  set  Ti  C  0  with  i/(Ti)  >  0  such  that  atg(x)  exists,  is 
positive,  and  is  continuous  in  both  0  and  x  for  {6,x)  £  x  [0,  m(„)];  similarly 
there  exists  a  compact  set  T2  C  ^  corresponding  to  /%. 

3.  The  prior  v  is  absolutely  continuous  with  respect  to  Lebesgue  measure  and 
for  each  i,  i  =  1, . . .  ,n,  /<(•,  •)  is  positive  and  continuous  on  [0,m(„)]‘  x  Ti;  an 
analogous  condition  holds  for  the  prior  and  posterior  of  0. 

Then,  there  exists  a  value  A  >  0  such  that 

sup  \P]{8,C)-x(C)\<e~Xj  for  all  j  >  2.  (2.12) 

CeB, 

Recall  that  if  {Zk)°k =o  a  Markov  chain  on  {Z,J-)  with  n-step  transition  proba¬ 
bilities  Pn(z,C)  which  satisfy  the  Doeblin  condition,  i.e.  for  some  probability  mea¬ 
sure  p  on  (Z,T),  some  positive  integer  n0,  and  some  e  >  0, 


Pn°(z,C)  >  ep(C)  for  all  z  £  Z  and  all  C  £  T , 


(2.13) 
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then 

sup|Pn(z,C)-7r(C)|<(l-e)Ln/noJ  for  all  2  €  Z.  (2.14) 

The  proof  of  this  fact  involves  a  coupling  argument  which  we  sketch  below.  We 
may  write 

Pn°(2,-)  =  (1  -  e)j?(z,-)  +  ep(-)  for  all  2  €  Z.  (2.15) 

where  »/(z,  •)  =  ( Pn°(z ,  •)  —  cp(-))/(l  —  e).  By  the  Doeblin  condition  (2.13),  r/(z,  •)  is 
a  probability  measure.  Thus,  P"°(2,  •)  is  a  convex  combination  of  two  measures,  the 
second  of  which  does  not  involve  z.  Therefore,  the  representation  in  (2.15)  allows 
us  to  view  each  step  in  the  evolution  of  the  Markov  chain  { },  as  a  coin-tossing 
experiment  followed  by  a  draw  from  either  tj(z,  •)  or  />(•).  This  is  the  key  that  makes 
possible  the  coupling  argument. 

Now  run  two  chains  {W*}*  and  {Yt}(  as  follows.  Let  W0  =  w0  and  Vo  ~  tt.  If  the 
current  state  of  the  two  chains  is  yn_i),  generate  Wn  and  Yn  by  first  tossing  a 

coin  with  probability  of  heads  equal  to  e.  If  the  toss  results  in  a  head,  select  V'  ~  p, 
and  set  both  Wn  and  Yn  equal  to  V.  If  the  toss  results  in  tails,  independently  select 
Wn  from  T](wn-i ,•)  and  Yn  from  r)(yn-\,‘).  It  is  clear  that  {W*},  and  {V/},  are 
Markov  chains  with  transition  probabilities  Pn°/(2;,C)  and  that  V*  ~  %  for  all  i. 
Moreover, 

P{W(  ±  Ye  for  some  t  >  *}  <  (1  -  e)fc. 

This  argument  shows  that  the  Markov  chain  {Z^ },  satisfies 

sup|P{Zncfc€C}-7r(C)|<(l-e)fc, 

czr 

from  which  (2.14)  follows,  since  supC€^- 1 Pn(z,C)  —  tt(C)|  is  nonincreasing  in  n. 


Results  of  the  form  (2.12)  are  generally  proved  by  verifying  the  Doeblin  condi¬ 
tion  (2.13).  However,  we  shall  prove  our  theorem  by  dealing  with  the  underlying 
coupling  argument  directly  and  explicitly  because  our  arguments  are  then  easier  to 
follow. 

Proof  of  Theorem  1  Consider  starting  the  Gibbs  sampler  from  two  different 
initial  states  and  S and  producing  two  sequences  and  We  shall 

“couple”  these  sequences  by  defining  them  on  the  same  probability  space  in  such  a 
way  that 

P{S(2N)  =  S{2N)}  >  e,  (2.16) 

where  c  is  positive  and  can  be  chosen  independently  of  the  starting  states  S,0)  and 
S^°\  This  clearly  suffices  to  prove  the  theorem. 

Condition  (2.16)  indicates  that  our  proof  requires  two  passes  of  the  algorithm 
(or  equivalently,  two  “cycles”  of  the  Gibbs  sampler)  to  couple  the  two  sequences.  In 
the  first  pass,  we  show  that  there  is  a  positive  lower  bound  not  depending  on  the 
starting  states,  for  the  probability  that  for  each  i,i  =  1, . . . ,  n,  the  minima  X \  A  Vi 
and  Xi  A  Yi  are  not  equal  to  any  of  the  observed  maxima  m i, . . . ,  mn.  Then  in  the 
second  pass,  we  show  that  there  is  also  a  positive  lower  bound  for  the  probability 
that  for  each  i,  i  =  1, . . . ,  n,  X,  =  X,  and  Vi  =  Vi;  i.e.  S,  =  5,-.  The  probability  that 
So  =  So  is  handled  in  a  manner  similar  to  the  proof  of  (2.14). 

A  simple  example  may  help  to  identify  the  issues  involved.  Suppose 
that  n  =  2  and  m\  <  m2  with  starting  states  S =  (Sq0\  S{0),  Sj0*)  = 
((0,^),(£1,mi),(mi,m2))  and  5(0)  =  ((^,^),(mi,£2),(m2,m1)),  where  tx,t2  <  mx. 
At  time  t  =  1,  we  update  9 ,  ^  and  0,  ip,  but  this  is  not  important.  At  time  t  —  2,  we 
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update  (Xx,  yj)  and  (Xi,  Vi),  and  this  is  where  the  problem  arises.  For  the  sequence 
{5*^},  Aj2^  has  an  atom  at  mi,  but  B {2)  does  not,  so  P(X[2)  =  mj,  F'/2*  <  =  1. 

For  the  sequence  {5  },  B\  has  an  atom  at  m x,  but  A\  '  does  not,  so  P(  xr'  < 
mi,  Y^  =  mi)  =  1.  Thus,  with  probability  1,  the  two  sequences  are  not  coupled 
during  the  first  pass  of  the  algorithm.  In  the  proof,  we  will  show  that  after  one  pass 
of  the  algorithm,  the  atoms  at  mi  and  m;  axe  “out  of  the  way”  and  thus,  there  is 
a  positive  probability  (that  is  independent  of  the  starting  states)  that  S*6)  =  S  , 
making  (2.16)  true. 

The  following  will  involve  a  specific  implementation  of  the  algorithm  described 
by  (2.6)  through  (2.11),  strictly  for  use  in  our  coupling  argument.  Let  {£/,*;  j  — 
l,...,oo; 

k  —  1,2, 3, 4}  be  an  array  of  independent  W(0, 1)  (the  uniform  distribution  on  (0, 1)) 
random  variables.  We  shall  generate  0W,  Xf*\  and  in  terms  of  these 
uniform  random  variables  using  certain  functions  p\,p2,  and  p3.  These  functions 
and  their  properties  are  described  in  detail  later  in  this  chapter.  Specifically,  we 
define  the  functions  and  show  that  using  them  produces  the  desired  conditional 
distribution  of  S ^  given 

For  t  with  ( t  —  1)  mod  N  =  0,  define  Sq  *  =  by 

0(t)  =  Pi(X^-l\Utl,Ut2)  and  </>(t)  =  P2{Y(t~x\Ut3,Ut,),  (2.17) 

and  for  t  with  (<  —  1)  mod  N  =  K  ^  0,  define  S$  =  (Xj£\  Y^)  by 

(Xi\Y^)  =  P3(0(<-1\  rnK,  ^(Lk),  ^(-K),  Utu  Vn,  Ua),  (2.18) 

where  W(_/q  is  used  to  denote  the  (n  —  l)-tuple  obtained  by  deleting  the  Kth 


element  of  the  n-tuple  W. 
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For  the  process  {S***},  the  quantities  0(t\  ft*\  „Y(t\  and  are  defined  in  the 
same  way  except  that  9 ,  t/>,  X ,  and  V  are  everywhere  replaced  by  6,  rp,  X,  and 
V.  The  same  uniform  variates  are  used  in  generating  both  S(t)  and  S(t). 

The  essence  of  the  proof  consists  of  showing  that  there  is  a  positive  proba¬ 
bility  (which  is  independent  of  the  starting  states)  of  coupling  the  two  processes 
if  we  implement  the  algorithm  using  pi,p2,  and  p3  (i.e.  (2.16)  holds).  To  this 
end,  we  introduce  a  sequence  of  events  r4i,  A2l . . . ,  A2n  defined  as  follows.  Let 
M  =  For  t  =  1  and  t  =  N+  1,  we  define 

At  =  {ow  =  v>(t)  =  ft* >,  0(t)  €  ru  ftl)  e  r2}. 

(The  sets  Ti  and  T2  are  the  compact  sets  given  by  assumption  2  of  Theorem  1.)  For 
2  <  t  <  N  and  K  =  (t  —  1)  mod  N  we  define 

A,  =  {*£>  A  i  M,  xj?  A  Yp'  $  M). 

For  N  +  2  <  t  <  2 N  and  K  =  (t  —  1)  mod  N  we  define 

At  =  =  x!l\  Yp'  =  Yp\  x]p  A  Yp'  tM). 

Note  that  flt^i  At  implies  that  S,2i'V|  =  S^2N\  We  can  write 


where  Ci  =  P{A\)  and  ( t  =  P(-<4t  |  f\j<tAj)  for  t  >  1.  We  shall  show  that  each  of 
the  values  Ct  are  bounded  away  from  0  by  quantities  which  do  not  depend  on  S(0) 
and  S^°\  In  the  last  part  of  this  chapter,  we  establish  these  bounds  in  Facts  1-6. 
Specifically,  Fact  1  is  used  to  bound  Ci  and  C/v+i!  and  Facts  2  and  3  are  used  to  bound 
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<^2,  •  •  • ,  C N-  Now  consider  N  +  2  <  t  <  2 N  and  K  =  {t  —  1)  mod  N.  Conditional  on 
fljct  A],  the  vectors  and  y|_K-j  contain  no  entries  equal  to 

mK-  (Here  we  are  using  assumption  1  of  Theorem  1;  i.e.  the  values  mx, . . . ,  mn  are 
distinct.)  This  will  allow  us  to  use  Fact  6  to  bound  Cn+2iCn+3,-  ■  •  02 n*  which  will 
establish  (2.16)  and  complete  the  proof  of  Theorem  1. 

Details  of  Proof  of  Theorem  1  We  shall  now  define  in  detail  the  functions 
Pi,p2i  and  />3  and  demonstrate  the  properties  of  these  functions  that  are  used  in 
the  proof  of  our  theorem.  In  this  discussion  we  will  use  the  following  notation. 
If  H  is  a  distribution  function,  then  H *  will  denote  a  function  with  the  property 
that  H\U)  ~  H,  when  U  is  a  £f(0, 1)  random  variable.  Such  an  H *  always  exists. 
In  the  case  of  a  distribution  function  on  TV,  the  function  //t  may  be  taken  to  be 
H\y )  =  inf{x  :  H(x)  >  y}.  For  economy  of  notation,  if  h  is  a  density,  then  h f  is 
used  if  the  distribution  function  associated  with  h  has  not  been  introduced.  Finally, 
let  U\,  U2,  U3,  U4  be  independent  £f(0, 1)  random  variables. 

By  (2.17),  we  generate  9  and  -0  from  uniform  random  variables  using  functions 
pi  and  p2  defined  below.  Define 
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Note  that  ir i  >  0.  This  is  because  by  assumption  3  of  Theorem  1,  for  each  i,  the 
function  /,  is  bounded  away  from  0  over  [0,  m(n)]'  x  Tj.  Therefore,  for  each  u  6  I\, 
<p\{u)  is  positive.  It  follows  that  f0<pi(u)du  >  JFi  0i(u)du  >  0. 

Note  that  in  the  special  case  of  the  “exponential/gamma”  setup,  we  have 


M“)  =  M.  £(«  I  u{x),b(x)) 


where  a(ac)  =  ax  4-  nj,  and  6(*)  =  &i  +  £*  X{.  For  x  with  0  <  x,  <  m,,  i  —  1, . . .  n, 
(a(x),b(x))  ranges  over  a  compact  set  which  excludes  (0,0).  Thus,  it  is  clear  that 
<pi(u)  >  0  for  all  u  >  0. 

Now  we  define  the  function  p\.  For  given  values  x, ul,u2l  we  let 


Pi(x,u  i,u2) 


<^l(u2)  if  Ui  <  7T!, 

Vx(u2)  otherwise. 


It  is  easy  to  verify  that  pi(x,  Ui,  U2)  ~  vx. 

For  generating  rp  we  define  a  function  p2  in  a  similar  manner.  Let 


Mu )  =  Py(u), 

?r2  =  J  Mu)du, 


and  densities 


uu)=*m  *ua 

X2  1  —  7T2 


Note  that  7r2  >  0.  Now  we  define  p2.  For  given  values  y,«i,u2,  we  let 


P2(y,ui,u2 )  =  < 


<t>l(u2)  ifui<7T2, 

Py(u2)  otherwise. 


It  is  easy  to  verify  that  p2(y,Uz,Ui)  ~  vy. 
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In  the  proof  of  our  theorem,  we  need  the  following  fact  regarding  p\  and  p2 .  Let 
*,x,y,y  be  arbitrary  fixed  values  in  E.  Define  8  =  p\{x,  U\,  (/2),  9  =  pi(x ,  U\,  U2), 
xp  —  P2{y,U3,Ud),  and  ip  =  p2{y,  U3,  L\).  From  the  definitions  of  pi  and  p2  it  is 
clear  that 

P{9  =  9,  6  €  Ti}  >  tti  f  4>\(u)du  =  /  (f>\(u)du  >  0, 

Jv  1 

and  similarly, 

P{ip  =  ip,  ip  €  r2}  >  7t2  f  d>2(u)du  =  /  <p2{u)du  >  0, 

J  r2  Jv2 


so  that  by  independence  we  obtain  the  following. 

Fact  1 


P{6  =  8, ip  =  ip, 9  €  Ti,ip  €  r2}  >  [  <pi(u)du  f  <p2{u)du. 

JTi  J  r2 


Note  that  this  bound  does  not  involve  x,x,y,  and  y. 


By  Equation  (2.18),  the  generation  of  Sk  =  (Xk,  Yk)  will  be  done  using  a 
function  p3(9,ip,m,x,y,ui,u2,u3),  where  to  >  0,  x  =  (x1? . . . ,  xn-i)  and  y  = 
(j/i, . . . ,  3/n— 1 )  are  (n  —  l)-tuples  with  x,,  y,  >  0  for  all  i.  The  function  p3  will  be 
defined  in  the  following  discussion. 

In  Lemma  2.1,  take 


—  Qg  ~b  Nx 
ag(TZ)  +  n  -  1 


and  3  f*  +  Ny  ,• 

+  n  -  1 


where  Nx  =  and  iVy  =  £"7/  <5V| .  Define  pi  .  ..,p5  and  the  distributions 

Ac,  Bc,  Ad,  Bd  as  in  Lemma  2.1.  These  quantities  are  all  (implicitly)  functions  of 
8,xp,x,  and  y. 
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Now  define  p3  by 

(^(u3),m)  if  0<uj<pi+p2  and 

0  <  u2  <  pi/(pi  H-  Pa), 

(m,  i?,!(ti3))  if  0  <  ux  <  pi  +  p2  and 
Pi/(Pi  +P2)  <  «2  <  1, 

(i4j(u3),m)  if  Pi  +P2  <  <  Pi  +P2  +P3, 

(m,i?J(ti3)  ifpi+P2+P3  <u\<  P1+P2+P3+P4, 

(m,m)  if  Pi  +  P2  +  P3  +  P4  <  «i  <  1. 

It  is  clear  that  the  pair  (X,  Y)  =  P3(^,V>,m,a;,y,C/i,C/2,t/3)  has  the  distribution 
given  in  (2.7)  since  the  function  p3  merely  describes  a  particular  way  to  carry  out 
the  mixing  procedure  in  Lemma  2.1. 

The  function  p3  has  two  properties  which  we  use  in  the  proof  of  our  theorem. 
Pick  arbitrary  fixed  values  0  G  0,  x/j  €  m  >  0,  and  x,x,y,y  in  72.n_1.  Define 

(X,Y)  =  p3(0,^,m,x,y,[/1,(72,t/3)  and 

( X,Y )  =  p3(0,^,m,x,y,[/i,U2,[/3). 

First  note  that  X  A  Y  is  generated  from  a  continuous  distribution  when  U\  < 
Pi  +  p-2 .  Therefore  we  have  the  following  fact. 

Fact  2  If  U\  <  (pi  +  P2)  A  (pi  +  P2),  then  both  X  AY  and  X  AY  are  generated 
from  continuous  distributions. 

In  conjunction  with  this  fact,  we  note  that  pi  +  P2  >  //(/  +  1),  where 

_  qg(m)  A 
J  ~  n-  1 
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To  see  this,  we  go  back  to  the  expressions  for  pi, . . .  ,p5  given  in  Lemma  2.1.  We 
find  that 


A d(m)pi  =  Ac(m){p3  +  p5)  and  Bd{m)p2  =  £c(m)(p4  +  p5). 


Since 


Ad(m)  < 


Bd{m)  < 


ae(72.)  +  n  —  1 


Mm)  =  ■;  -rSr-rz — r,  and  Bc(m)  < 


0xb('R-)  +  n  —  1 


ae(1Z)  +  n  -  1 


MR)  +  n  ~  1  ’ 


these  relations  imply 


(n  -  l)p!  >  Qg(m)(p3  +  p5)  and  (n  -  l)p2  >  ^(m)(p4  -f  p5)- 


Thus 

Pi  +  P2  >  /  •  (Pz  +  P4  +  Ps)  =  /  •  (1  -  Pi  -  P2) 

(see  the  expressions  for  pi, . . .  ,p$  given  in  Lemma  2.1)  so  that  pi  +p2  >  //( 1  +  /) 
as  desired.  From  assumption  2  of  Theorem  1,  we  know  that  /  is  bounded  below  for 
0  €  and  i/>  €  T2.  Therefore,  we  can  state  the  following  fact. 

Fact  3  For  9  €  Ti  and  i/>  €  T2,  the  value  (pi  +P2)  is  bounded  away  from  0  by  some 
quantity  which  does  not  depend  on  0,il>,x,  or  y. 

Now  observe  that  A  and  A  have  the  same  continuous  part;  that  is  Ac  ~  Ac.  This 
implies  that  Ac  =  Ac  and  thus  (Ac)*  =  (Ac)*.  Similarly,  ( Bc )*  =  ( Bc )L  This  leads 
to  our  next  property. 

Fact  4  IfU\<  (pi  +  p2)  A  (pi  +  P2)  and 

TT  ^  Pi  a  Pi  tt  ^  Pi  w  Pi 

t/2  <  - A  - - —  or  (J2  >  - V  - - — , 

Pi  +  P2  Pi  +  P2  Pi  +  P2  Pi  +  P2 

then  X  —  X  ,  Y  =  Y,  and  X  AY  is  generated  from  a  continuous  distribution. 
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Also  note  that  when  JVx({m})  =  Ary({m})  =  0,  we  have 

Pi  _ _ Qg(m)/?;(m) _ 

P1+P2  a9(m)j3'^{m)  +  a'B(m)^(m) 

so  that  pi/(pi  +P2)  does  not  depend  on  x  and  y  in  this  situation. 

Now,  using  properties  in  assumption  2  of  Theorem  1,  we  obtain  Fact  5. 

Fact  5  If  Aa.({m})  =  ^y({m})  =  0,  then  for  9  €  Tx  an d  0  €  r2,  the  value 
Pi/(Pi  +P2)  can  be  bounded  away  from  0  and  1  by  quantities  which  do  not  depend 
on  0,ip,x,  and  y. 

Combining  Facts  3,  4,  and  5  leads  to  our  last  property. 

Fact  6  IfNx({m})  =  Ny{{m})  =  N2{{m})  =  iV~({m})  =  0,  6  G  rlt  and  0  €  r2, 
then  the  probability  of  the  event 

{X  =  X,Y  =  Y,  and  X  A  Y  is  generated  from  a  continuous  distribution} 
is  bounded  away  from  0  by  a  quantity  which  does  not  depend  on  9, 0,  x,y,x,  or  y. 
This  concludes  the  details  of  the  proof  of  Theorem  1 . 


Note  that 

a(F,G)  =  J (1>a,9+^”=l  «X|  ®  ^\,+£"=1  )  £data(d0,  <*0,  dX ,  dY). 

(In  particular,  the  marginal  posterior  distribution  of  F  is  a  mixture  of  Dirichlets, 
and  a  similar  statement  holds  for  G.)  Let 


(f,g)°N)  ~  vQe{jN)+z:=1sxUN)  ®  v^us)+y::=1s. 


■i}N) 
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Corollary  2.1  Let  QgN *  denote  the  distribution  of  ( F,  1  when  the  chain 
{S(tN)}(  is  started  at  a.  Then  under  the  conditions  of  Theorem  1 ,  we  have 

sup  \Q%]{(F,G)UN)  e  A}  -  Pd*t*{{F,G)  e  A) I  <  e"Aj  for  all  s<°>  (2.19) 

where  A  is  the  same  A  that  appears  in  the  statement  of  Theorem  1. 

(In  (2.19),  A  =  Ao  x  Ao ,  where  Ao  is  the  smallest  cr-field  on  the  set  of  probability 
measures  on  TL  such  that  the  map  P  t— ►  P(B)  is  measurable  for  each  Borel  set 

Ben.) 

Proof  of  Corollary  2.1  Let  7r*  be  the  distribution  of  when  the  Markov 
chain  is  started  at  8.  Fix  A  6  A  and  let 

/a(s)  =  2>*e+£»=1«Xi  ® 

We  know  that 

WSSV)  -  M(F.C)  e  A}|  =  |  /  -  J 

=  \J  fA(*)d(x%'-*)(*)\  (2.20) 

=  |/  -*)*{•)- j  -*)  (*)|- 

(For  a  signed  measure  A,  the  representation  A  =  A+  —  A“  is  the  standard  Jordan 
decomposition  of  A.)  Since  /*(•)  is  a  measurable  function  of  a  that  satisfies  0  < 

/A(a)  <  1  for  all  a,  we  see  that  each  of  the  two  integrals  in  the  last  line  of  (2.20)  is 
bounded  by  exp(— Aj),  and  this  proves  Corollary  2.1. 


where  beta(a,  &)(•)  is  the  Beta  density. 


2.4  The  Algorithm  for  Arbitrary  Systems 

In  this  chapter  we  discuss  the  implementation  and  convergence  of  the  algorithm 
in  the  general  case  and  also  the  issues  of  identifiability  and  frequentist  consistency. 

2.4.1  Implementation  of  the  Algorithm 

Let  the  autopsy  statistics  for  system  i,  i  =  1, . . .  ,n  be  (71,,  Dt),  where  Tt  is  the 
death  time  of  the  system  and  D,  is  the  set  of  components  that  are  dead  at  time  T,. 
Recall  that  after  examining  (71;,  Dt),  each  component  in  system  i  is  put  into  exactly 
one  of  the  categories  Cl,  C2,  C3,  or  C4  described  in  Chapter  1.  For  a  component  in 
Category  Cl,  one  generates  an  observation  according  to  the  distribution  ( A-^)^,*,), 
where  A-^  is  defined  in  (2.9)  (i.e.  the  distribution  A-^  restricted  to  (Ti,  oo)  and 
renormalized  to  be  a  probability  measure).  Similarly,  for  components  in  Category 
C2,  we  generate  an  observation  from  (Af*)[0)Tt).  For  a  component  in  Category  C3, 
nothing  needs  to  be  done. 

Suppose  there  are  k  components  that  fall  in  Category  C4.  We  then  use  an 
extension  of  Lemma  2.1  describing  the  conditional  distribution  of  k  independent 
random  variables  ( k  >  2),  whose  distributions  have  both  absolutely  continuous  and 
discrete  components,  given  the  value  of  their  maximum.  The  necessary  formulas  are 
easy  to  derive  but  require  elaborate  notation  to  write  down  explicitly,  and  so  are 
not  given  here.  We  note  however,  that  the  needed  computer  algorithm  is  relatively 
easy  to  implement. 
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We  remark  that  the  case  of  an  arbitrary  coherent  system  is  no  more  difficult  than 
that  for  a  general  parallel  system  if  we  note  that  the  set  of  components  in  Category 
C4  changes  from  system  to  system. 

2.4.2  Convergence  of  the  Algorithm 

When  considering  the  case  of  an  arbitrary  system,  it  is  helpful  to  first  look 
at  the  situation  when  the  prior  distribution  on  each  Fj  is  a  single  Dirichlet,  i.e. 
there  is  no  mixing.  In  this  case,  the  updating  of  the  lifelength  of  component  j  in 
system  i  is  based  on  (2.9),  where  a^o  is  replaced  simply  by  a.  When  updating  a 
component  in  Category  Cl,  the  probability  of  drawing  from  the  fixed  probability 
measure  proportional  to  a(-  n(T,,  oo))  is  bounded  below  by  ot((Tt,  oo))/(a((T,,  oo))  + 
n  —  1),  independently  of  the  current  state  of  the  chain.  A  similar  statement  holds 
for  the  lifelengths  of  components  in  Category  C2.  We  have  already  explained  how 
to  deal  with  the  lifelengths  for  components  in  Category  C4  in  Section  2.3.  Thus  a 
coupling  argument  along  the  lines  of  the  proof  of  Theorem  1  gives  convergence  at  a 
uniform  geometric  rate. 

When  the  priors  on  the  F/ s  are  mixtures  of  Dirichlets,  a  difficulty  arises  in  that 
the  distributions  a^t)  in  general  need  not  have  a  uniform  lower  bound.  For  parallel 
systems  we  were  able  to  find  a  uniform  lower  bound  for  the  posterior  distribution  of 
0  given  the  lifelengths  X  only  because  X  is  known  to  lie  in  a  compact  set.  Since  the 
lifelengths  of  components  in  Category  Cl  do  not  lie  in  a  compact  set,  this  argument 
no  longer  applies.  For  general  systems  convergence  of  the  Markov  chain  can  be 
established  using  the  lower  bounds  established  in  Theorem  1  in  conjunction  with 
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Theorem  1  of  Athreya,  Doss,  and  Sethuraman  (1992),  which  gives  simple  ergodiritv 
(i.e.  convergence,  but  not  at  a  geometric  rate). 

2.4.3  Identifiability  and  Frequentist  Consistency 

We  begin  by  giving  a  precise  definition  of  “consistency  of  the  posterior  distri¬ 
bution  of  F  =  (Fi, . . . ,  Fmy .  Let  X{,  be  the  lifelength  of  component  j  in  system 
i  ( j  =  i  =  1,2,...),  find  assume  that  for  each  j,  the  random  variables 

Xij,  X2j, . . .  form  an  iid  sequence  from  Fj0\  Let  data(n)  be  the  set  of  autopsy 
statistics  for  the  first  n  systems.  We  say  that  the  posterior  distribution  of  F  is 
consistent  at  F ^  =  (Fj°\ . . . ,  F$)  if  with  probability  one,  £^Bta(n)(F)  converges 
in  distribution  to  the  point  mass  at  F^°\  in  the  weak  topology  on  Vm.  We  shall  say 
that  the  posterior  is  consistent  if  it  is  consistent  for  every  F(0)  6  Vm.  This  notion 
of  consistency  refers  to  the  posterior  and  not  to  estimators. 

The  Dirichlet  priors  on  the  positive  integers  were  originally  introduced  by  Freed¬ 
man  (1963)  as  a  prior  with  good  consistency  properties:  He  proved  that  under 
standard  iid  sampling,  a  Dirichlet  prior  is  consistent  for  ail  parameter  points  in  the 
topological  support  of  the  prior.  His  results  were  generalized  to  the  continuous  case 
by  Fabius  (1964).  Diaconis  and  Freedman  (1983)  showed  that  mixtures  of  Dirichlet 
priors  are  consistent  if 

supag(7?.)  <  oo  (2.21) 

9 

(in  the  notation  of  (2.5)).  Note  that  this  result  pertains  to  the  parameter  F  and 
not  to  the  mixture  parameter  0;  see  p.  1117  of  Diaconis  and  Freedman  (1983). 

It  is  easy  to  see  that  the  posterior  is  not  consistent  for  parallel  systems,  since 
it  is  easy  to  construct  two  distinct  m-tuples  (Ff°\ . . . ,  F^)  and  ( F ,(1\ . . . ,  F^])  for 
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which  the  autopsy  statistics  have  identical  distributions.  Of  course,  as  mentioned 
earlier,  such  systems  are  not  identifiable. 

In  a  recent  paper,  Lo  (1991)  showed  that  for  the  random  censorship  model  of 
survived  analysis  (which  corresponds  precisely  to  the  autopsy  model  for  a  two  com¬ 
ponent  series  system),  the  posterior  is  consistent  for  any  pair  (ff°\  F^)  such  that 
Fi0)  and  have  the  same  essential  maxima.  A  condition  analogous  to  (2.21)  is 
required  on  the  priors  on  Fi  and  F 2.  We  note  that  the  consistency  results  for  the 
Kaplan-Meier  estimator  imply  that  series  systems  are  identifiable  for  pairs  Fx(0)  and 
F^  having  the  same  essential  maxima. 

Since  series  and  parallel  systems  are  often  considered  extreme  cases  in  coherent 
structure  theory  it  is  natural  to  conjecture  that  under  condition  (2.21)  for  each 
component,  the  posterior  is  consistent  at  (Fx°\ . . . ,  Fjfl)  if  the  system  is  identifiable 


CHAPTER  3 

ANALYSIS  OF  U.S.  AIR  FORCE  C-17  FUEL  QUANTITY 

COMPUTER  DATA 


We  illustrate  our  algorithm  on  data  involving  survival  times  of  the  Fuel  Quantity 
(FQ)  Computer  system  of  the  C-17  transport  aircraft.  The  test  program  will  eventu¬ 
ally  involve  six  aircraft  being  flown  for  approximately  10000  cumulative  hours.  Our 
data  set  is  taken  relatively  early  in  the  test  program,  since  only  2440  flight  hours 
had  been  accumulated  at  the  time  of  this  writing.  The  data,  listed  below  in  Table 
3.1,  fall  into  one  of  three  categories  (we  denote  the  failure  times  of  the  “A-bus”  and 
“B-bus”  as  X  and  Y,  respectively): 

•  The  FQ  computer  fails  (both  buses  are  dead)  and  the  maximum  survival  time, 
say  t0,  of  the  two  buses  is  observed;  i.e.  we  have  the  usual  autopsy  statistics 
(system  failure  time  and  set  of  dead  components).  This  type  of  observation 
has  the  form  “Max=<0”  • 

•  The  two  components  are  checked  at  time  t\  and  time  t2.  Both  buses  are  alive 
at  t\ ,  but  one  of  the  buses,  say  “B”,  is  in  a  failed  state  at  t2.  Even  though  the 
“A-bus”  is  alive  at  t2,  the  FQ  computer  is  replaced.  This  situation  generates 
two  observations,  which  have  the  form  Y  €  (^i ,  <2]  and  X  €  (£2,00)- 

•  Both  components  of  the  FQ  computer  are  alive  when  the  data  are  taken,  but 
the  aircraft  had  flown  for  £3  hours.  The  failure  times  for  both  buses  lie  in 
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the  interval  [<3, 00).  Thus,  two  observations  are  generated:  .Y  €  (*3, 00)  and 
Y  €  (*3,00). 

Note  that  this  data  structure  is  a  bit  more  complex  than  the  data  structure  in  the 
autopsy  model;  however,  the  required  modifications  to  the  algorithm  involve  no  real 
difficulties. 

The  reader  may  wonder  why  there  is  a  need  for  a  computer,  as  opposed  to  a 
simple  analogue  gauge,  to  deal  with  fuel  quantity.  Indeed,  this  is  not  a  frivolous 
issue.  There  is  actually  a  need  for  a  computer  even  during  level  flight,  since  the 
aircraft  maintains  its  desired  center  of  gravity  via  fuel  transfer  from  one  wing  to 
another.  This  task  is  further  complicated  as  the  aircraft  flies  at  different  angles  or 
possibly  under  turbulence.  The  FQ  computer  receives  the  current  angle  of  flight 
from  another  computer  and  uses  this  information,  along  with  readings  from  a  series 
of  probes  in  each  fuel  tank,  to  make  accurate  fuel  quantity  calculations.  Also,  the 
Mission  Computer  requires  input  from  the  FQ  computer  to  make  range  calculations. 

We  analyzed  the  C-17  data  using  our  proposed  algorithm.  We  took  our  prior  on 
both  F  and  G  to  be  (2.5),  where  Hg  is  the  exponential  distribution  with  parameter 
6  (the  mean  is  I/O).  We  assumed  atg(1Z)  to  be  constant  in  0  and  considered  three 
cases:  otg(R)  =  1,  ag(Tl)  10,  and  a$( K)  =  100.  We  took  v  =  Q{a,b)  (the  Gamma 
distribution  with  shape  parameter  a  and  scale  parameter  6),  since  we  wished  to 
center  the  prior  around  the  family  of  exponential  distributions,  and  the  Gamma  is 
conjugate  for  this  family.  From  (2.11),  i>x  =  £(a  +  n*,b+  H'-Y,),  where  n *  is  the 
number  of  distinct  observations  in  X  and  is  the  sum  of  the  distinct  .Yt's;  ty 

is  similarly  defined. 
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We  elicited  the  prior  of  a  computer  systems  engineer  from  the  C-17  Special 
Programs  Office  by  asking  his  opinion  about  the  FQ  computer  Mean  Time  Before 
Failure  (MTBF)  with  respect  to  two  reports.  The  first  report,  supplied  by  the 
C-17  manufacturer,  provides  target  numbers  for  each  “Logical  Replaceable  Unit” 
(LRU),  including  the  FQ  computer.  The  manufacturer  guarantees  that  the  MTBF 
for  each  LRU,  computed  at  the  end  of  the  acceptance  testing  period,  will  exceed  that 
LRU’s  target  number.  The  C-17  engineer  thought  it  was  highly  likely  (probability 
of  .9)  that  the  FQ  Computer  MTBF  would  exceed  the  target  number  (which  was 
1300  hours).  The  second  report,  supplied  to  the  C-17  Special  Programs  Office 
by  the  manufacturer’s  design  group,  contains  a  list  of  “mature”  MTBF  numbers 
for  each  LRU  being  evaluated.  These  numbers  represent  an  average  of  MTBF’s, 
by  LRU,  across  many  different  aircraft  which  have  similar  LRUs.  The  data  come 
from  maintenance  data  accumulated  following  the  acceptance  testing  periods  for 
each  aircraft  (hence  the  word  “mature”).  Since  these  numbers  come  from  mature 
aircraft,  the  C-17  engineer  thought  it  was  quite  unlikely  (probability  of  .1)  that  the 
FQ  Computer  MTBF  for  the  acceptance  testing  period  would  exceed  the  mature 
MTBF  number  (which  was  3167  hours).  Thus,  we  took  1300  hours  and  3167  hours 
to  be  the  .1  and  .9  quantiles  of  the  distribution  of  the  MTBF  for  the  FQ  Computer. 

If  X  ~  £(9)  (the  exponential  distribution  with  parameter  9)  and  Y  ~  5(r/ ’), 
then  MTBF  =  E(X  VY)  =  (1/(9  +  i/>))[  1  +  6 ft  +  t/>/0].  If  0,t/>  are  iid  ~  Q(a,b), 
then  the  .1  and  .9  quantiles  of  the  distribution  of  the  MTBF  are  equal  to  1300  and 
3167,  respectively,  when  a  =  6.04424  and  b  =  6835.32.  Note  that  if  the  conditional 
distribution  of  X  given  9  is  £(9 )  and  9  is  distributed  as  Q(a,b),  then  the  uncondi¬ 
tional  cumulative  distribution  function  of  X  is  F(t)  =  1  —  (6/(6  +  t))°,  which  is  a 
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“shifted  Pareto”  distribution  with  parameters  a  and  b.  The  prior  distributions  of  JV 
and  Y  are  each  shifted  Pareto  distributions  with  parameters  6.04424  and  6835.32. 
The  prior  distribution  of  X  V  Y  is  the  product  of  two  such  distributions. 

Of  particular  interest  to  the  C-17  engineers  is  the  question  of  how  the  lifelength  of 
a  future  FQ  computer,  as  well  as  the  lifelengths  of  a  future  A-bus  and  B-bus,  would 
be  distributed.  The  Bayes  approach  is  especially  well-suited  to  answer  such  a  ques¬ 
tion.  Figures  3.1,  3.2,  and  3.3  give  the  prior  and  posterior  cumulative  distribution 
functions  for  the  lifelengths  of  the  C-17  FQ  computer  (maximum  lifelength  of  the 
A-bus  and  B-bus),  the  A-bus,  and  the  B-bus,  for  the  cases  ag(H)  =  1,  ag(1t)  =  10, 
and  —  100.  Note  that  for  each  posterior  distribution,  the  prior  distribu¬ 

tion  is  a  shifted  Pareto  distribution  with  parameters  6.04424  and  6835.32  (or  the 
product  of  two  such  distributions  in  the  case  of  Figure  3.1).  Recall  from  Table  3.1 
that  three  real  deaths  were  observed  (at  times  43.4,  236.8,  244.0),  and  each  one 
generates  a  death  for  X,  Y,  and  X  V  Y .  Thus,  all  the  posterior  distributions  have 
atoms  at  these  three  failure  times.  Figures  3.4,  3.5,  and  3.6  give  the  prior  density 
and  a  representation  of  the  posterior  distribution  for  the  future  lifelengths  of  the 
maximum,  A-bus,  and  B-bus.  The  masses  at  these  three  failure  times  have  been 
removed  and  plotted  as  distinct  spikes,  with  their  masses  labeled  separately.  The 
reader  will  notice  that,  even  after  removing  these  three  atoms,  all  the  posterior  dis¬ 
tributions  have  large  “local  peaks”  just  to  the  left  of  the  removed  atoms.  These 
peaks  are  not  surprising,  since  the  algorithm  first  determines  which  component  will 
be  assigned  the  maximum  value,  and  then  assigns  a  failure  time  that  is  less  than 
the  maximum  failure  time  to  the  other  component’s  lifelength.  The  atoms  assigned 
to  the  lifelength  of  the  “non-maximum”  component  will  tend  to  be  relatively  close 
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to  the  observed  maximum  values  whenever  those  maximum  values  are  small,  which 
is  the  case  for  this  data  set.  It  is  interesting  to  note  that  the  distribution  of  the 
lifelength  of  the  B-bus  has  much  more  mass  to  the  left  of  236.8  hours  than  the  other 
two  distributions,  making  it  necessary  to  use  a  larger  scale  for  the  vertical  axis  in 
Figure  3.6.  This  can  be  attributed  to  the  two  pairs  of  observations  (A4.Y4)  and 
(X5,Ys)  (see  Table  3.1),  which  were  generated  due  to  a  failure  in  the  B-bus  while 
the  A-bus  was  still  functioning.  For  low  values  of  ag(1Z)  (the  cases  ag(TZ)  =  1  and 
ag (71)  =  10),  these  two  observations  cause  the  algorithm  to  assign  a  fairly  high  value 
to  the  conditional  probability  that  the  lifelength  of  the  A-bus  is  set  to  the  observed 
maximum  and  the  lifelength  of  the  B-bus  is  set  to  a  value  less  than  the  observed 
maximum.  Figures  3.7,  3.8,  and  3.9  give  the  density  estimates  for  the  lifelengths  of 
a  future  maximum,  A-bus,  and  B-bus,  with  the  spikes  smoothed.  In  these  figures, 
it  can  be  seen  that  the  masses  at  the  three  observed  FQ  computer  failure  times 
account  for  much  of  the  mass  in  the  posterior  distributions  for  the  cases  Qg(TZ)  =  1 
and  ae(H)  =  10. 

Due  to  the  small  sample  size  of  our  data  set,  we  were  able  to  run  the  Gibbs 
sampler  with  one  long  chain  of  length  500000,  skipping  every  tenth  value.  We  used 
the  output  of  of  the  algorithm  to  generate  50000  lifelengths  of  a  future  maximum, 
A-bus,  and  B-bus. 

The  means  of  the  posterior  distributions  of  the  lifelengths  of  the  FQ  computer, 
for  the  cases  a$(7?.)  =  1,  ag(7i)  =  10,  and  ag{7l)  =  100,  are  1254,  1230,  and  1230 
hours,  respectively.  At  this  relatively  early  stage  of  the  study,  we  conclude  that  the 
performance  of  the  FQ  computer  is  not  as  good  as  the  Air  Force  would  like  to  see, 
as  the  means  are  just  below  the  minimum  acceptable  MTBF.  However,  we  caution 
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that  the  study  will  continue  for  an  additional  two  years  beyond  the  close  of  our  data 
set,  and  that  some  of  the  early  failures  experienced  can  be  directly  attributed  to 
ongoing  design  changes.  It  will  be  interesting  to  rerun  the  algorithm  on  the  updated 
data  set  at  the  close  of  the  acceptance  testing  period. 
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Figure  3.1.  Prior  and  Posterior  Cumulative  Distribution  Functions  for  the  life 
length  of  C- 17  FQ  Computer  (Maximum  of  A-bus  and  B-bus),  a(R)  =  1 .  a{R)  =  10 
and  a (R)  =  100. 


Posterior  Distribution  Function  for  Lifelength  of  A-bus,  aipha(H)-i 


Figure  3.2.  Prior  and  Posterior  Cumulative  Distribution  Functions  for  the  life 
length  of  the  FQ  Computer’s  A-bus.  a(R)  =  1,  a(R)  =  10,  and  a(R)  =  100. 
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Prior  Distribution  Function  tor  Litelength  ol  B-bus 


Posterior  Distribution  Function  lor  Litelength  of  B-bus,  alpha(R)-i 
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Posterior  Distribution  Function  for  Litelength  of  B-bus,  alpha(R)-100 
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Figure  3.3.  Prior  and  Posterior  Cumulative  Distribution  Functions  for  the  life 
length  of  the  FQ  Computer’s  B-bus,  a(R)  =  1,  a{R)  =  10,  and  a(R)  =  100. 
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Table  3.1.  C-17  Fuel  Quantity  Computer  Data.  The  first  three  lines  contain 
observed  FQ  Computer  failure  times.  The  next  two  categories  (four  observations) 
occurred  when  the  FQ  computer  was  replaced  due  to  a  failure  in  one  bus.  The 
remaining  observations  occurred  because  no  more  data  is  available. 


Aircraft  Id 

Event 

hours 

Observation 

P-1 

FQ  Computer  failed 

43.4 

Xi  V  Yi  =  43.4 

P-1 

FQ  Computer  failed 

236.8 

X2  V  >2  =  236.8 

P-2 

FQ  Computer  failed 

244.0 

X3  V  V3  =  244.0 

T-l 

A-bus  and  B-bus  alive 

11.9 

B-bus  dead 

15.4 

y;  €  (11.9,15.4] 

FQ  Computer  replaced 

15.4 

X4  €  (15.4,  oo) 

P-4 

A-bus  and  B-bus  alive 

174.4 

B-bus  dead 

181.8 

Ys  6  (174.4,181.8] 

FQ  Computer  replaced 

181.8 

Xs  6  (181.8,00) 

T-l 

A-bus  and  B-bus  alive 

819.6  | 

X6  £  (819.6,  oo) 

no  more  data  available 

i 

Y6  £  (819.6,  oo) 

P-1 

A-bus  and  B-bus  alive 

85.0 

X7  €  (85.0, 00) 

no  more  data  available 

>7  €  (85.0, 00) 

P-2 

A-bus  and  B-bus  alive 

476.4 

X8€  (476.4,oo) 

no  more  data  available 

Ys  €  (476.4, 00) 

P-3 

A-bus  and  B-bus  alive 

24.5 

A9  (E  (24.5, 00) 

independent  software  failure 

Y§  €  (24.5, 00) 

P-3 

A-bus  and  B-bus  alive 

71.7 

*10  €  (71.7, 00) 

no  more  data  available 

*10  €  (71.7,  00) 

P-4 

A-bus  and  B-bus  alive 

68.4 

An  €  (68.4,oo) 

no  more  data  available 

Fii  €  (68.4, 00) 

P-5 

A-bus  and  B-bus  alive 

173.4 

*12  €  (173.4.00) 

no  more  data  available 

Y12  €  ( 1  ( 3.4, 00) 
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Figure  3.4.  Prior  and  Posterior  Density  Estimates  for  the  Ufelength  of  the  FQ 
Computer  (Maximum  of  A-bus  and  B-bus),  a(/?)  =  1,  a(R)  =  10,  and  a(i?)  =  100. 
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Prior  Density  lor  Lifeienglh  ol  A- bus 


Posterior  Density  for  Lilelength  of  a  Future  A-bus,  large  atoms  removed,  alpha-1 


Posterior  Density  tor  Lifelength  of  a  Future  A-bus,  large  atoms  removed,  alpha-10 


Posterior  Density  for  Lifelength  of  a  Future  A-bus,  large  atoms  removed,  alpha-100 
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Figure  3.5.  Prior  and  Posterior  Density  Estimates  for  the  lifelength  of  the  FQ 
Computer’s  A-bus,  a(R)  =  1,  a(R)  =  10,  and  a(R)  =  100. 
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Figure  3.6.  Prior  and  Posterior  Density  Estimates  for  the  lifelength  of  the  FQ 
Computer’s  B-bus,  a(R)  =  1,  a(R)  =  10,  and  a(R)  —  100. 
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Figure  3.8.  Prior  and  Posterior  Density  Estimates  for  the  lifelength  of  the  FQ 
Computer’s  A-bus,  a(R)  =  1 ,  a(R)  =  10,  and  a(R)  =  100. 
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